Celestin Apprentice 2

home *** CD-ROM | disk | FTP | other *** search

/ Celestin Apprentice 2 / Apprentice-Release2.iso / Source Code / AppleScript / Additions / ACME Script Widgets 1.0 / Tokenize ƒ / Tokenize READ ME < prev next >

Wrap

Text File | 1994-11-02 | 4.4 KB | 131 lines | [ttro/ttxt]

_________________________________________________ Tokenize Scripting Addition ver. 1.1 Copyright (C) 1994 Wayne Walrath _________________________________________________ This software is free for personal use. To obtain a cheap and simple license for corporate, commercial or institutional use, contact the author at one of the addresses listed at the end of this document. THIS SOFTWARE IS PROVIDED AS IS WITHOUT WARRANTIES. USE AT YOUR OWN RISK! You are encouraged to share this software with other people and to upload it to online services, but you may not charge money for it and you should only transfer the complete package. Contact me if you doubt whether you have a complete package. Inclusion on CD-ROMs requires explicit permission from me (the author). The demo AppleScript included with the distribution contains many examples of Tokenize's usage. Tokenize was designed to make it easier to split text into elements based on a set of delimiters. The demo AppleScript illustrates several novel uses for Tokenize which may not be obvious at first glance. INSTALLATION: ______________________ To install: Drag Tokenize to the Scripting Additions folder inside the Extensions folder. BACKGROUND INFORMATION ______________________ Because of the way the tokenization is implemented, Tokenize can also be used as a quick way of removing unwanted characters from a text string. To better understand what is possible with Tokenize, here's a brief description of how Tokenize functions. The text to be tokenized is scanned for each of the strings given in the delimiter list, and all occurrences of these strings are replaced by a special character (essentially a null-char). After all delimiters are processed, a final pass is made which gathers all the strings between the special characters into a list. Understanding this algorithm will help you to figuring out how text will ultimately be parsed when using Tokenize. For example, consider an arbitrary string of text which contains words separated by tab characters, and between each word there will be one to three tabs. Here's a string set up as described: set testString to "One\tGiant\t\tStep\tFor\t\t\tMankind" If I tokenize this string using tab as the only delimiter, It returns this list: tokenize testString with delimiters tab => {"One", "Giant", "Step", "For", "Mankind"} If, on the other hand, I tokenize using a string of three tabs, the output is different: tokenize testString with delimiters tab & tab & tab => {"One Giant Step For", "Mankind"} The output from this version consists of a list of two strings. Since tokenize only found one place in the testString where there were three tab characters side by side it split the string there. Tokenizing with a two tab string would produce yet a different result. USAGE: ______________________ tokenize <a String> with delimiters { [<sep. string 1>] [,<sep. string2]...} the direct parameter to tokenize is a string, and the second (required) parameter is a list of strings (one or more bytes in length) to use in tokenizing the direct parameter. If you are only tokenizing with one delimiter you need not pass it as a list since AppleScript will handle the coercion for you. For example, the following is legal: tokenize "My Name Is" with delimiters " Name " => {"My","Is"} Some text processing tasks require more than one call to Tokenize to perform. As an example, if the variable myText contained a number of lines separated by return characters, and you wanted to retrieve the words from line five, you could write the following AppleScript commands: tokenize myText with delimiters {return} tokenize (item 5 of result) with delimiters {space} => [result is a list with all the words from line five of the text] ______________________ Comments, bug reports and suggestions are welcomed. If you have any ideas for useful Scripting Additions which haven't been written yet, send me a message describing your idea. VERSION HISTORY: ______________________ VER 1.1 - 2Nov94 Fixed bug which surfaced in ver 1.0 when the tokens were longer than 255 chars. This bug resulted in random memory being stomped on when the token was too long. All users of version 1.0 should switch immediately. Cleaned the code up a bit and optimized it a bit. VER 1.0 - Oct94 First release. ___________________________ Wayne Walrath 2010 Ravenswood Dr. Evansville, IN 47714 (812) 476-8610 walrath@cs.indiana.edu CIS: 70233,3151 ___________________________